#store string containing all required packages
my_packages <- c('rtweet', 'ggplot2', 'igraph', 'RColorBrewer', 'readr', 'bipartite')
Figure out which of these packages is already installed
# Store all installed packages
ya_installed <- library()$results[,1]
# Check whether required packages are already installed and grab only those that still need installation
need_install<-my_packages[!(my_packages %in% ya_installed)]
#install required packages
lapply({need_install}, install.packages, character.only = TRUE)
## list()
# Store all installed packages
ya_loaded <- (.packages())
# Check whether required packages are already installed and grab only those that still need installation
need_load<-my_packages[!(my_packages %in% ya_loaded)]
# Load required packages
lapply(need_load, require, character.only = TRUE)
## [[1]]
## [1] TRUE
##
## [[2]]
## [1] TRUE
##
## [[3]]
## [1] TRUE
##
## [[4]]
## [1] TRUE
##
## [[5]]
## [1] TRUE
##
## [[6]]
## [1] TRUE
#also load the readr library
library(readr)
library(bipartite)
library(dplyr)
library(tidyr)
library(lubridate)
library(ggplot2)
library(igraph)
library(reshape2)
Our data comes from the Fed’s MBS purchase records. Each observation corresponds to a security the Fed purchased, including: the CUSIP identifier, purchase date, sale date (if sold), and the breakdown of the underlying mortgage principal across U.S. states and territories (in dollars). For example, a row might indicate the Fed bought Security X on 2010-05-15, with 30% of its mortgages from California, 10% from Florida, 5% from New York, etc. First, we load and clean the data (e.g., converting strings to dates and numbers). We compute each MBS’s total HoldingDays (sale date – purchase date). Then we reshape the state columns from wide to long format for analysis: each security becomes multiple records like (CUSIP, State, principal% of that MBS). Here’s a glimpse after cleaning:
# Load data (all columns read as characters initially)
data <- read_csv("org_full_time_series.csv", col_types = cols(.default = col_character()))
data <- data %>% select(-c(228,229,230,231))
# Convert numeric columns (e.g., UPB, face values, etc.) by removing "$" and ","
numeric_cols <- names(data)[7:ncol(data)] # assuming numeric data start from column 7
data[numeric_cols] <- data[numeric_cols] %>%
mutate(across(everything(), ~parse_number(.)))
# Merge duplicate DC columns if present
if("District Of Columbia_Aggregate_UPB" %in% names(data)) {
data <- data %>%
mutate(`District of Columbia_Aggregate_UPB` = coalesce(`District of Columbia_Aggregate_UPB`, `District Of Columbia_Aggregate_UPB`),
`District of Columbia_Percent_UPB` = coalesce(`District of Columbia_Percent_UPB`, `District Of Columbia_Percent_UPB`),
`District of Columbia_Number_of_Loans` = coalesce(`District of Columbia_Number_of_Loans`, `District Of Columbia_Number_of_Loans`),
`District of Columbia_Percent_Loans` = coalesce(`District of Columbia_Percent_Loans`, `District Of Columbia_Percent_Loans`)) %>%
select(-starts_with("District Of Columbia_"))
}
# Convert date columns to Date type
data <- data %>%
mutate(`Purchase Date` = as.Date(`Purchase Date`),
`Sale Date` = as.Date(`Sale Date`))
# Create HoldingDays variable (difference between Sale Date and Purchase Date)
data_filtered <- data %>% mutate(HoldingDays = as.numeric(`Sale Date` - `Purchase Date`))
# Peek at the data structure
glimpse(data)
## Rows: 31,713
## Columns: 219
## $ CUSIP <chr> "3128M5KV1", "31292KML8", "3128M7EK8"…
## $ `Purchase Date` <date> 2009-09-30, 2009-09-30, 2009-09-30, …
## $ `Sale Date` <date> 2023-01-18, 2019-11-20, 2019-11-20, …
## $ `Security Description` <chr> "FHLMC PC GOLD 6% 08/37", "FHLMC PC G…
## $ Term <chr> "30yr", "30yr", "30yr", "30yr", "15yr…
## $ `Purchase Face Value` <chr> "187837097.5", "120842681.55", "97968…
## $ `Sale Face Value` <dbl> 2906060.00, 5430483.38, 7399348.21, 3…
## $ Alabama_Aggregate_UPB <dbl> NA, NA, 668340.35, 27300525.55, 44301…
## $ Alabama_Percent_UPB <dbl> NA, NA, 0.57, 0.46, 0.65, 0.87, NA, 3…
## $ Alabama_Number_of_Loans <dbl> NA, NA, 64, 113, 30, 21, NA, 13, 15, …
## $ Alabama_Percent_Loans <dbl> NA, NA, 1.02, 0.56, 0.73, 1.11, NA, 3…
## $ Alaska_Aggregate_UPB <dbl> NA, NA, 686581.8, 54830508.6, NA, NA,…
## $ Alaska_Percent_UPB <dbl> NA, NA, 0.58, 0.93, NA, NA, NA, NA, N…
## $ Alaska_Number_of_Loans <dbl> NA, NA, 44, 191, NA, NA, NA, NA, NA, …
## $ Alaska_Percent_Loans <dbl> NA, NA, 0.70, 0.94, NA, NA, NA, NA, N…
## $ Arizona_Aggregate_UPB <dbl> NA, 6464780.1, 2223283.5, 72709482.1,…
## $ Arizona_Percent_UPB <dbl> NA, 0.93, 1.89, 1.23, 1.72, 1.10, 1.5…
## $ Arizona_Number_of_Loans <dbl> NA, 27, 103, 290, 58, 22, 20, 4, 12, …
## $ Arizona_Percent_Loans <dbl> NA, 0.85, 1.63, 1.42, 1.42, 1.17, 1.4…
## $ Arkansas_Aggregate_UPB <dbl> NA, NA, NA, 27435617.6, NA, NA, 20838…
## $ Arkansas_Percent_UPB <dbl> NA, NA, NA, 0.47, NA, NA, 0.87, 3.46,…
## $ Arkansas_Number_of_Loans <dbl> NA, NA, NA, 113, NA, NA, 10, 15, 10, …
## $ Arkansas_Percent_Loans <dbl> NA, NA, NA, 0.56, NA, NA, 0.73, 3.56,…
## $ California_Aggregate_UPB <dbl> NA, 21700106.3, 12544263.9, 156222397…
## $ California_Percent_UPB <dbl> NA, 3.12, 10.68, 26.51, 6.58, 7.55, 9…
## $ California_Number_of_Loans <dbl> NA, 83, 466, 4420, 154, 95, 80, 10, 2…
## $ California_Percent_Loans <dbl> NA, 2.61, 7.39, 21.72, 3.76, 5.03, 5.…
## $ Colorado_Aggregate_UPB <dbl> NA, 5139628.8, 4604514.1, 227337233.2…
## $ Colorado_Percent_UPB <dbl> NA, 0.74, 3.92, 3.86, 2.00, 2.04, 1.4…
## $ Colorado_Number_of_Loans <dbl> NA, 23, 205, 844, 63, 37, 15, 6, 18, …
## $ Colorado_Percent_Loans <dbl> NA, 0.72, 3.25, 4.15, 1.54, 1.96, 1.0…
## $ Connecticut_Aggregate_UPB <dbl> NA, 9973831.4, 1285096.4, 57659205.8,…
## $ Connecticut_Percent_UPB <dbl> NA, 1.43, 1.09, 0.98, 1.22, 1.20, 0.7…
## $ Connecticut_Number_of_Loans <dbl> NA, 43, 71, 189, 42, 20, 7, NA, 8, 4,…
## $ Connecticut_Percent_Loans <dbl> NA, 1.35, 1.13, 0.93, 1.03, 1.06, 0.5…
## $ Delaware_Aggregate_UPB <dbl> NA, 5743523.9, NA, 26070316.8, NA, NA…
## $ Delaware_Percent_UPB <dbl> NA, 0.82, NA, 0.44, NA, NA, NA, NA, N…
## $ Delaware_Number_of_Loans <dbl> NA, 25, NA, 96, NA, NA, NA, NA, NA, 4…
## $ Delaware_Percent_Loans <dbl> NA, 0.79, NA, 0.47, NA, NA, NA, NA, N…
## $ Florida_Aggregate_UPB <dbl> NA, 10589908.5, 3561820.4, 108253948.…
## $ Florida_Percent_UPB <dbl> NA, 1.52, 3.03, 1.84, 2.53, 1.29, 1.4…
## $ Florida_Number_of_Loans <dbl> NA, 47, 137, 419, 101, 28, 17, 14, 16…
## $ Florida_Percent_Loans <dbl> NA, 1.48, 2.17, 2.06, 2.47, 1.48, 1.2…
## $ Georgia_Aggregate_UPB <dbl> NA, 6745409.86, 3901019.98, 161236018…
## $ Georgia_Percent_UPB <dbl> NA, 0.97, 3.32, 2.74, 2.91, 3.46, 2.6…
## $ Georgia_Number_of_Loans <dbl> NA, 33, 394, 596, 128, 68, 43, 15, 18…
## $ Georgia_Percent_Loans <dbl> NA, 1.04, 6.25, 2.93, 3.13, 3.60, 3.1…
## $ Guam_Aggregate_UPB <dbl> NA, 4315916, NA, NA, NA, NA, NA, NA, …
## $ Guam_Percent_UPB <dbl> NA, 0.62, NA, NA, NA, NA, NA, NA, NA,…
## $ Guam_Number_of_Loans <dbl> NA, 20, NA, NA, NA, NA, NA, NA, NA, N…
## $ Guam_Percent_Loans <dbl> NA, 0.63, NA, NA, NA, NA, NA, NA, NA,…
## $ Hawaii_Aggregate_UPB <dbl> NA, 4002131.1, 1178502.9, 41131196.9,…
## $ Hawaii_Percent_UPB <dbl> NA, 0.57, 1.00, 0.70, NA, NA, NA, NA,…
## $ Hawaii_Number_of_Loans <dbl> NA, 14, 198, 113, NA, NA, NA, NA, NA,…
## $ Hawaii_Percent_Loans <dbl> NA, 0.44, 3.14, 0.56, NA, NA, NA, NA,…
## $ Idaho_Aggregate_UPB <dbl> NA, NA, NA, 46282025.5, NA, 2115365.5…
## $ Idaho_Percent_UPB <dbl> NA, NA, NA, 0.79, NA, 0.56, NA, 2.88,…
## $ Idaho_Number_of_Loans <dbl> NA, NA, NA, 194, NA, 15, NA, 12, 7, 1…
## $ Idaho_Percent_Loans <dbl> NA, NA, NA, 0.95, NA, 0.79, NA, 2.85,…
## $ Illinois_Aggregate_UPB <dbl> NA, 80226728.5, 6934001.1, 105953557.…
## $ Illinois_Percent_UPB <dbl> NA, 11.52, 5.90, 1.80, 11.52, 8.22, 6…
## $ Illinois_Number_of_Loans <dbl> NA, 356, 315, 391, 514, 153, 93, 10, …
## $ Illinois_Percent_Loans <dbl> NA, 11.19, 5.00, 1.92, 12.55, 8.11, 6…
## $ Indiana_Aggregate_UPB <dbl> NA, 22235271.6, 4700892.3, NA, 301872…
## $ Indiana_Percent_UPB <dbl> NA, 3.19, 4.00, NA, 4.41, 2.04, 3.02,…
## $ Indiana_Number_of_Loans <dbl> NA, 113, 134, NA, 229, 46, 49, 6, 14,…
## $ Indiana_Percent_Loans <dbl> NA, 3.55, 2.13, NA, 5.59, 2.44, 3.56,…
## $ Iowa_Aggregate_UPB <dbl> NA, 4815682.0, 1161811.9, 51332910.9,…
## $ Iowa_Percent_UPB <dbl> NA, 0.69, 0.99, 0.87, 1.15, 2.23, 1.1…
## $ Iowa_Number_of_Loans <dbl> NA, 25, 86, 230, 60, 55, 22, 7, 14, 1…
## $ Iowa_Percent_Loans <dbl> NA, 0.79, 1.36, 1.13, 1.46, 2.91, 1.6…
## $ Kansas_Aggregate_UPB <dbl> NA, 8158904.0, NA, NA, 7330728.9, 276…
## $ Kansas_Percent_UPB <dbl> NA, 1.17, NA, NA, 1.07, 0.73, 1.08, 0…
## $ Kansas_Number_of_Loans <dbl> NA, 39, NA, NA, 56, 16, 19, 4, NA, NA…
## $ Kansas_Percent_Loans <dbl> NA, 1.23, NA, NA, 1.37, 0.85, 1.38, 0…
## $ Kentucky_Aggregate_UPB <dbl> NA, 18922944.3, 2408359.7, NA, 242450…
## $ Kentucky_Percent_UPB <dbl> NA, 2.72, 2.05, NA, 3.54, 2.78, 1.73,…
## $ Kentucky_Number_of_Loans <dbl> NA, 93, 96, NA, 170, 60, 38, 3, 7, 7,…
## $ Kentucky_Percent_Loans <dbl> NA, 2.92, 1.52, NA, 4.15, 3.18, 2.76,…
## $ Louisiana_Aggregate_UPB <dbl> NA, NA, NA, 41945902.2, 6226414.5, 20…
## $ Louisiana_Percent_UPB <dbl> NA, NA, NA, 0.71, 0.91, 0.55, NA, 2.9…
## $ Louisiana_Number_of_Loans <dbl> NA, NA, NA, 172, 36, 11, NA, 13, 25, …
## $ Louisiana_Percent_Loans <dbl> NA, NA, NA, 0.85, 0.88, 0.58, NA, 3.0…
## $ Maine_Aggregate_UPB <dbl> NA, 10475648.0, NA, NA, 6537219.2, 24…
## $ Maine_Percent_UPB <dbl> NA, 1.50, NA, NA, 0.96, 0.65, NA, NA,…
## $ Maine_Number_of_Loans <dbl> NA, 50, NA, NA, 43, 11, NA, NA, NA, N…
## $ Maine_Percent_Loans <dbl> NA, 1.57, NA, NA, 1.05, 0.58, NA, NA,…
## $ Maryland_Aggregate_UPB <dbl> NA, 7707617.6, 2016762.4, 165471733.3…
## $ Maryland_Percent_UPB <dbl> NA, 1.11, 1.72, 2.81, 2.40, 2.95, 2.5…
## $ Maryland_Number_of_Loans <dbl> NA, 30, 181, 540, 69, 46, 26, 8, 9, 5…
## $ Maryland_Percent_Loans <dbl> NA, 0.94, 2.87, 2.65, 1.68, 2.44, 1.8…
## $ Massachusetts_Aggregate_UPB <dbl> NA, 14046574.0, 1287181.9, 74057034.6…
## $ Massachusetts_Percent_UPB <dbl> NA, 2.02, 1.10, 1.26, 3.86, 5.67, 2.8…
## $ Massachusetts_Number_of_Loans <dbl> NA, 60, 109, 249, 121, 86, 30, NA, NA…
## $ Massachusetts_Percent_Loans <dbl> NA, 1.89, 1.73, 1.22, 2.95, 4.56, 2.1…
## $ Michigan_Aggregate_UPB <dbl> NA, 34407872.6, 9141046.4, 23341862.7…
## $ Michigan_Percent_UPB <dbl> NA, 4.94, 7.78, 0.40, 5.30, 1.74, 2.4…
## $ Michigan_Number_of_Loans <dbl> NA, 160, 259, 100, 251, 33, 43, 5, 9,…
## $ Michigan_Percent_Loans <dbl> NA, 5.03, 4.11, 0.49, 6.13, 1.75, 3.1…
## $ Minnesota_Aggregate_UPB <dbl> NA, 15069349.8, 2729141.3, 242325680.…
## $ Minnesota_Percent_UPB <dbl> NA, 2.16, 2.32, 4.11, 1.69, 4.97, 4.7…
## $ Minnesota_Number_of_Loans <dbl> NA, 72, 250, 956, 72, 91, 67, 16, 24,…
## $ Minnesota_Percent_Loans <dbl> NA, 2.26, 3.97, 4.70, 1.76, 4.82, 4.8…
## $ Mississippi_Aggregate_UPB <dbl> NA, NA, NA, NA, NA, NA, NA, 391517.46…
## $ Mississippi_Percent_UPB <dbl> NA, NA, NA, NA, NA, NA, NA, 0.97, 0.8…
## $ Mississippi_Number_of_Loans <dbl> NA, NA, NA, NA, NA, NA, NA, 4, 6, NA,…
## $ Mississippi_Percent_Loans <dbl> NA, NA, NA, NA, NA, NA, NA, 0.95, 0.8…
## $ Missouri_Aggregate_UPB <dbl> NA, 17264398.4, 3315209.0, 45593935.5…
## $ Missouri_Percent_UPB <dbl> NA, 2.48, 2.82, 0.77, 3.34, 3.73, 1.8…
## $ Missouri_Number_of_Loans <dbl> NA, 88, 182, 194, 160, 90, 32, 11, 22…
## $ Missouri_Percent_Loans <dbl> NA, 2.77, 2.89, 0.95, 3.91, 4.77, 2.3…
## $ Montana_Aggregate_UPB <dbl> NA, NA, NA, 30756797.6, NA, NA, NA, N…
## $ Montana_Percent_UPB <dbl> NA, NA, NA, 0.52, NA, NA, NA, NA, 1.4…
## $ Montana_Number_of_Loans <dbl> NA, NA, NA, 131, NA, NA, NA, NA, 10, …
## $ Montana_Percent_Loans <dbl> NA, NA, NA, 0.64, NA, NA, NA, NA, 1.4…
## $ Nebraska_Aggregate_UPB <dbl> NA, NA, NA, NA, 6667142.3, 2599290.0,…
## $ Nebraska_Percent_UPB <dbl> NA, NA, NA, NA, 0.97, 0.69, NA, NA, 1…
## $ Nebraska_Number_of_Loans <dbl> NA, NA, NA, NA, 37, 14, NA, NA, 14, 5…
## $ Nebraska_Percent_Loans <dbl> NA, NA, NA, NA, 0.90, 0.74, NA, NA, 2…
## $ Nevada_Aggregate_UPB <dbl> NA, NA, 763262.4, 31635586.2, NA, NA,…
## $ Nevada_Percent_UPB <dbl> NA, NA, 0.65, 0.54, NA, NA, NA, 0.95,…
## $ Nevada_Number_of_Loans <dbl> NA, NA, 37, 126, NA, NA, NA, 4, NA, N…
## $ Nevada_Percent_Loans <dbl> NA, NA, 0.59, 0.62, NA, NA, NA, 0.95,…
## $ `New Hampshire_Aggregate_UPB` <dbl> NA, 11190888.9, NA, NA, 7053649.9, 26…
## $ `New Hampshire_Percent_UPB` <dbl> NA, 1.61, NA, NA, 1.03, 0.70, 1.40, N…
## $ `New Hampshire_Number_of_Loans` <dbl> NA, 50, NA, NA, 34, 14, 14, NA, NA, N…
## $ `New Hampshire_Percent_Loans` <dbl> NA, 1.57, NA, NA, 0.83, 0.74, 1.02, N…
## $ `New Jersey_Aggregate_UPB` <dbl> NA, 6092436.5, 3134652.2, 260554205.8…
## $ `New Jersey_Percent_UPB` <dbl> NA, 0.87, 2.67, 4.42, 1.46, 2.56, 2.3…
## $ `New Jersey_Number_of_Loans` <dbl> NA, 22, 117, 810, 40, 36, 22, 6, 9, 7…
## $ `New Jersey_Percent_Loans` <dbl> NA, 0.69, 1.86, 3.98, 0.98, 1.91, 1.6…
## $ `New Mexico_Aggregate_UPB` <dbl> NA, NA, NA, NA, 5986529.0, NA, 214370…
## $ `New Mexico_Percent_UPB` <dbl> NA, NA, NA, NA, 0.87, NA, 0.89, 1.26,…
## $ `New Mexico_Number_of_Loans` <dbl> NA, NA, NA, NA, 35, NA, 13, 5, NA, NA…
## $ `New Mexico_Percent_Loans` <dbl> NA, NA, NA, NA, 0.85, NA, 0.94, 1.19,…
## $ `New York_Aggregate_UPB` <dbl> 559000004.5, 39097115.3, 1408295.2, 2…
## $ `New York_Percent_UPB` <dbl> 100.00, 5.61, 1.20, 4.82, 1.08, 0.87,…
## $ `New York_Number_of_Loans` <dbl> 2610, 172, 72, 861, 45, 16, 75, 10, 1…
## $ `New York_Percent_Loans` <dbl> 100.00, 5.41, 1.14, 4.23, 1.10, 0.85,…
## $ `North Carolina_Aggregate_UPB` <dbl> NA, 13156461.02, 5843971.84, 22001689…
## $ `North Carolina_Percent_UPB` <dbl> NA, 1.89, 4.97, 3.73, 1.62, 3.65, 4.6…
## $ `North Carolina_Number_of_Loans` <dbl> NA, 65, 187, 841, 59, 82, 66, 23, 33,…
## $ `North Carolina_Percent_Loans` <dbl> NA, 2.04, 2.97, 4.13, 1.44, 4.35, 4.8…
## $ `North Dakota_Aggregate_UPB` <dbl> NA, NA, NA, NA, NA, NA, 1943552, 6235…
## $ `North Dakota_Percent_UPB` <dbl> NA, NA, NA, NA, NA, NA, 0.81, 1.55, 1…
## $ `North Dakota_Number_of_Loans` <dbl> NA, NA, NA, NA, NA, NA, 16, 7, 9, NA,…
## $ `North Dakota_Percent_Loans` <dbl> NA, NA, NA, NA, NA, NA, 1.16, 1.66, 1…
## $ Ohio_Aggregate_UPB <dbl> NA, 45185886.21, 12619091.92, 4171642…
## $ Ohio_Percent_UPB <dbl> NA, 6.49, 10.74, 0.71, 4.30, 3.65, 3.…
## $ Ohio_Number_of_Loans <dbl> NA, 220, 213, 172, 199, 72, 61, 6, 6,…
## $ Ohio_Percent_Loans <dbl> NA, 6.92, 3.38, 0.85, 4.86, 3.82, 4.4…
## $ Oklahoma_Aggregate_UPB <dbl> NA, 25626381.2, NA, NA, 5035218.7, NA…
## $ Oklahoma_Percent_UPB <dbl> NA, 3.68, NA, NA, 0.74, NA, 0.78, 1.4…
## $ Oklahoma_Number_of_Loans <dbl> NA, 123, NA, NA, 35, NA, 16, 6, 15, N…
## $ Oklahoma_Percent_Loans <dbl> NA, 3.87, NA, NA, 0.85, NA, 1.16, 1.4…
## $ Oregon_Aggregate_UPB <dbl> NA, 6184463.0, 2692830.5, 211788102.2…
## $ Oregon_Percent_UPB <dbl> NA, 0.89, 2.29, 3.59, 0.94, 1.23, 1.9…
## $ Oregon_Number_of_Loans <dbl> NA, 27, 156, 826, 34, 24, 21, 10, 14,…
## $ Oregon_Percent_Loans <dbl> NA, 0.85, 2.47, 4.06, 0.83, 1.27, 1.5…
## $ Pennsylvania_Aggregate_UPB <dbl> NA, 19905185.3, 2581847.7, 228554584.…
## $ Pennsylvania_Percent_UPB <dbl> NA, 2.86, 2.20, 3.88, 2.72, 4.08, 3.0…
## $ Pennsylvania_Number_of_Loans <dbl> NA, 96, 173, 853, 103, 77, 50, 30, 49…
## $ Pennsylvania_Percent_Loans <dbl> NA, 3.02, 2.74, 4.19, 2.51, 4.08, 3.6…
## $ `Puerto Rico_Aggregate_UPB` <dbl> NA, NA, NA, NA, NA, NA, 2127839, NA, …
## $ `Puerto Rico_Percent_UPB` <dbl> NA, NA, NA, NA, NA, NA, 0.89, NA, NA,…
## $ `Puerto Rico_Number_of_Loans` <dbl> NA, NA, NA, NA, NA, NA, 25, NA, NA, N…
## $ `Puerto Rico_Percent_Loans` <dbl> NA, NA, NA, NA, NA, NA, 1.82, NA, NA,…
## $ `Rhode Island_Aggregate_UPB` <dbl> NA, NA, NA, NA, 4059645.8, NA, NA, NA…
## $ `Rhode Island_Percent_UPB` <dbl> NA, NA, NA, NA, 0.59, NA, NA, NA, NA,…
## $ `Rhode Island_Number_of_Loans` <dbl> NA, NA, NA, NA, 22, NA, NA, NA, NA, N…
## $ `Rhode Island_Percent_Loans` <dbl> NA, NA, NA, NA, 0.54, NA, NA, NA, NA,…
## $ `South Carolina_Aggregate_UPB` <dbl> NA, 7465108.4, 1185572.0, 66215049.8,…
## $ `South Carolina_Percent_UPB` <dbl> NA, 1.07, 1.01, 1.12, NA, 1.94, 1.09,…
## $ `South Carolina_Number_of_Loans` <dbl> NA, 34, 91, 253, NA, 35, 19, 17, 20, …
## $ `South Carolina_Percent_Loans` <dbl> NA, 1.07, 1.44, 1.24, NA, 1.85, 1.38,…
## $ `South Dakota_Aggregate_UPB` <dbl> NA, NA, NA, 26277148, NA, NA, NA, 501…
## $ `South Dakota_Percent_UPB` <dbl> NA, NA, NA, 0.45, NA, NA, NA, 1.25, 2…
## $ `South Dakota_Number_of_Loans` <dbl> NA, NA, NA, 121, NA, NA, NA, 6, 16, 1…
## $ `South Dakota_Percent_Loans` <dbl> NA, NA, NA, 0.59, NA, NA, NA, 1.43, 2…
## $ Tennessee_Aggregate_UPB <dbl> NA, 4415886.6, 2354480.0, 59796334.1,…
## $ Tennessee_Percent_UPB <dbl> NA, 0.63, 2.00, 1.01, 1.41, 2.31, 2.1…
## $ Tennessee_Number_of_Loans <dbl> NA, 20, 70, 233, 61, 42, 28, 20, 18, …
## $ Tennessee_Percent_Loans <dbl> NA, 0.63, 1.11, 1.14, 1.49, 2.23, 2.0…
## $ Texas_Aggregate_UPB <dbl> NA, 19369500.2, 3322309.9, 364820299.…
## $ Texas_Percent_UPB <dbl> NA, 2.78, 2.83, 6.19, 5.20, 2.16, 3.5…
## $ Texas_Number_of_Loans <dbl> NA, 90, 191, 1379, 186, 43, 46, 33, 6…
## $ Texas_Percent_Loans <dbl> NA, 2.83, 3.03, 6.78, 4.54, 2.28, 3.3…
## $ Utah_Aggregate_UPB <dbl> NA, 23855014.9, 5055112.4, 116990772.…
## $ Utah_Percent_UPB <dbl> NA, 3.42, 4.30, 1.98, 2.43, 4.15, 2.5…
## $ Utah_Number_of_Loans <dbl> NA, 115, 320, 460, 96, 85, 32, 14, 29…
## $ Utah_Percent_Loans <dbl> NA, 3.62, 5.08, 2.26, 2.34, 4.50, 2.3…
## $ Vermont_Aggregate_UPB <dbl> NA, 26386880.8, NA, NA, NA, 2548709.6…
## $ Vermont_Percent_UPB <dbl> NA, 3.79, NA, NA, NA, 0.67, NA, NA, N…
## $ Vermont_Number_of_Loans <dbl> NA, 116, NA, NA, NA, 12, NA, NA, NA, …
## $ Vermont_Percent_Loans <dbl> NA, 3.65, NA, NA, NA, 0.64, NA, NA, N…
## $ `Virgin Islands_Aggregate_UPB` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Virgin Islands_Percent_UPB` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Virgin Islands_Number_of_Loans` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Virgin Islands_Percent_Loans` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Virginia_Aggregate_UPB <dbl> NA, 19199244.1, 1722054.7, 200783139.…
## $ Virginia_Percent_UPB <dbl> NA, 2.76, 1.47, 3.41, 1.53, 3.57, 2.8…
## $ Virginia_Number_of_Loans <dbl> NA, 73, 171, 651, 55, 63, 33, 14, 15,…
## $ Virginia_Percent_Loans <dbl> NA, 2.30, 2.71, 3.20, 1.34, 3.34, 2.4…
## $ Washington_Aggregate_UPB <dbl> NA, 18532437.5, 3894258.4, 308971089.…
## $ Washington_Percent_UPB <dbl> NA, 2.66, 3.31, 5.24, 1.56, 2.06, 3.0…
## $ Washington_Number_of_Loans <dbl> NA, 75, 358, 1099, 50, 36, 33, 11, 19…
## $ Washington_Percent_Loans <dbl> NA, 2.36, 5.68, 5.40, 1.22, 1.91, 2.4…
## $ `West Virginia_Aggregate_UPB` <dbl> NA, NA, NA, NA, NA, NA, 1855672, NA, …
## $ `West Virginia_Percent_UPB` <dbl> NA, NA, NA, NA, NA, NA, 0.77, NA, NA,…
## $ `West Virginia_Number_of_Loans` <dbl> NA, NA, NA, NA, NA, NA, 12, NA, NA, N…
## $ `West Virginia_Percent_Loans` <dbl> NA, NA, NA, NA, NA, NA, 0.87, NA, NA,…
## $ Wisconsin_Aggregate_UPB <dbl> NA, 79304220.3, 1543769.7, 78820600.8…
## $ Wisconsin_Percent_UPB <dbl> NA, 11.39, 1.31, 1.34, 10.44, 7.49, 6…
## $ Wisconsin_Number_of_Loans <dbl> NA, 374, 225, 326, 487, 144, 89, 12, …
## $ Wisconsin_Percent_Loans <dbl> NA, 11.76, 3.57, 1.60, 11.89, 7.63, 6…
## $ Wyoming_Aggregate_UPB <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Wyoming_Percent_UPB <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Wyoming_Number_of_Loans <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Wyoming_Percent_Loans <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#data_filtered <- data %>%
# filter(year(`Purchase Date`) %in% c(2009:2012, 2019:2022))
There are 31,713 securities (CUSIPs) the Fed engaged with from 2009 to 2023. Each has columns for every state’s percentage of the total principal (“Percent_UPB”). For instance, Puerto Rico_Percent_UPB might be 0.89 for one security (meaning 0.89% of its mortgages are Puerto Rican). Many of these percentages are zero for a given state (not all pools include all states). We convert these to a long table of (CUSIP, State, percent) for network building. We also flag each security as “Concentrated” if any one state makes up >50% of its pool (versus “Diverse”). This will help see if highly localized pools behave differently.
# Reshape state-level UPB columns to long format
state_upb_long <- data_filtered %>%
select(CUSIP, `Purchase Date`, ends_with("_Aggregate_UPB")) %>%
pivot_longer(
cols = ends_with("_Aggregate_UPB"),
names_to = "State",
names_pattern = "(.*)_Aggregate_UPB",
values_to = "UPB"
) %>%
mutate(UPB = as.numeric(UPB),
Year = year(`Purchase Date`)) %>%
filter(!is.na(Year))
# Compute total UPB per CUSIP (summing across all states)
cusip_totals <- state_upb_long %>%
group_by(CUSIP) %>%
summarize(Total_UPB = sum(UPB, na.rm = TRUE), .groups = "drop")
# Merge total UPB back and calculate percentage per state
cusip_composition <- state_upb_long %>%
left_join(cusip_totals, by = "CUSIP") %>%
mutate(Percent_State_UPB = if_else(Total_UPB > 0, 100 * UPB / Total_UPB, 0))
state_year <- state_upb_long %>%
group_by(Year, State) %>%
summarize(Total_UPB = sum(UPB, na.rm=TRUE), .groups="drop")
# Identify top 5 states by total UPB across all years
top_states <- state_year %>%
group_by(State) %>%
summarize(Overall_UPB = sum(Total_UPB)) %>%
arrange(desc(Overall_UPB)) %>%
slice_head(n=5) %>%
pull(State)
top_states
## [1] "California" "Illinois" "Texas" "New York" "Florida"
The top five states by total Fed-purchased principal were California, Illinois, Texas, New York, and Florida. These are large states with big housing markets, so it makes sense. California in particular stands out as a mega-hub, with a huge share of Fed-supported mortgages. In network terms, California is a high-degree node connecting to many others The chart below plots each year’s purchased principal for all states (faceted) and highlights the top 5 states over time: Total UPB by State, 2009–2023. Each panel is a year; within a panel states are sorted by total principal the Fed bought that year. California (the long light-blue bars in many years) consistently dominates. Peaks correspond to Fed purchase waves (2009–2010 QE1, 2020 QE4), benefiting big states most. The Fed’s MBS buys surged in 2009–2010 and again in 2020, then tapered. California’s bar dwarfs others, especially in those waves – visually confirming it as a hub. New York, Illinois, Texas, and Florida also contribute large chunks. Many smaller states and territories appear as tiny slivers.
library(ggplot2)
library(scales)
##
## Attaching package: 'scales'
## The following object is masked from 'package:readr':
##
## col_factor
ggplot(state_year, aes(x = Total_UPB,
y = reorder(State, Total_UPB))) +
geom_col(fill = "steelblue") +
# convert raw dollars → billions and add “B” suffix
scale_x_continuous(
labels = label_number(
scale = 1e-9,
suffix = "B",
accuracy = 0.1
),
name = "Unpaid Principal Balance (USD billions)"
) +
labs(title = "Total UPB by State, 2009–2023", y = "State") +
facet_wrap(~ Year, ncol = 5, scales = "free_x") +
theme_minimal(base_size = 10) +
theme(
strip.text = element_text(size = 5),
axis.text.y = element_text(size = 2),
plot.title = element_text(hjust = 0.5)
)
library(ggplot2)
# Filter data for top 5 states and plot
state_year_top5 <- state_year %>% filter(State %in% top_states)
ggplot(state_year_top5, aes(x=Year, y=Total_UPB/1e9, color=State)) +
geom_line(size=1) + geom_point(size=2) +
labs(title="Aggregate UPB of Loans by State Over Time (Top 5 States)",
x="Year of Purchase", y="Total UPB (billion USD)") +
theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
What about holding periods? The Fed doesn’t hold MBS
forever; many were sold or repaid. The distribution of holding days (for
transactions that ended) is:
# Calculate holding period (days) for each purchase
data <- data %>% mutate(HoldingDays = as.numeric(`Sale Date` - `Purchase Date`))
summary(data$HoldingDays)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0 616 721 1201 1372 5019 5096
# Plot distribution of holding periods
ggplot(data, aes(x=HoldingDays)) +
geom_histogram(binwidth=180, fill="skyblue", color="black") +
geom_vline(xintercept=mean(data$HoldingDays, na.rm=TRUE), color="red", linetype="dashed") +
geom_vline(xintercept=median(data$HoldingDays, na.rm=TRUE), color="darkgreen", linetype="dashed") +
labs(title="Distribution of CUSIP Holding Periods", x="Holding Period (days)", y="Number of CUSIPs") +
theme_minimal()
## Warning: Removed 5096 rows containing non-finite outside the scale range
## (`stat_bin()`).
On average the Fed held an MBS ~3.3 years (1201 days) – but with huge variance (some under a month, some over 13 years). About 5,096 purchases were still held as of our data end (NA sale date). The red line marks the mean, green the median: Distribution of CUSIP Holding Periods. Most MBS were held 1–4 years. The mean (red) is higher due to a long tail of MBS the Fed held for 10+ years (often because borrowers didn’t refinance) The median (green) ~721 days.
# Average holding period by state
state_holdings <- state_upb_long %>%
filter(UPB > 0) %>%
select(CUSIP, State) %>%
distinct() %>%
left_join(data %>% select(CUSIP, HoldingDays), by="CUSIP") %>%
group_by(State) %>%
summarize(AvgHoldingDays = mean(HoldingDays, na.rm=TRUE), .groups="drop") %>%
arrange(AvgHoldingDays)
## Warning in left_join(., data %>% select(CUSIP, HoldingDays), by = "CUSIP"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 2 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
head(state_holdings, 5) # 5 states with shortest average holding period
## # A tibble: 5 × 2
## State AvgHoldingDays
## <chr> <dbl>
## 1 Virgin Islands 977.
## 2 Hawaii 1052.
## 3 Rhode Island 1105.
## 4 Delaware 1129.
## 5 Wyoming 1136.
tail(state_holdings, 5) # 5 states with longest average holding period
## # A tibble: 5 × 2
## State AvgHoldingDays
## <chr> <dbl>
## 1 Iowa 1330.
## 2 North Dakota 1336.
## 3 Oklahoma 1341.
## 4 Guam 1399.
## 5 Puerto Rico 2114.
# For each CUSIP, compute the maximum state percentage
cusip_exposure <- cusip_composition %>%
group_by(CUSIP) %>%
summarize(Max_Percent = max(Percent_State_UPB, na.rm = TRUE), .groups = "drop")
# Merge the exposure data with the holding period information
cusip_exposure <- cusip_exposure %>%
left_join(data_filtered %>% select(CUSIP, HoldingDays), by = "CUSIP") %>%
mutate(Exposure = if_else(Max_Percent > 50, "Concentrated", "Diverse"))
# Check summary
summary(cusip_exposure)
## CUSIP Max_Percent HoldingDays Exposure
## Length:31713 Min. : 2.843 Min. : 0 Length:31713
## Class :character 1st Qu.: 12.355 1st Qu.: 616 Class :character
## Mode :character Median : 20.340 Median : 721 Mode :character
## Mean : 27.038 Mean :1201
## 3rd Qu.: 35.049 3rd Qu.:1372
## Max. :100.000 Max. :5019
## NA's :5096
ggplot(cusip_exposure, aes(x = Exposure, y = HoldingDays, fill = Exposure)) +
geom_boxplot() +
labs(title = "Holding Period by UPB Exposure Concentration",
x = "Exposure Type",
y = "Holding Period (days)") +
theme_minimal()
## Warning: Removed 5096 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
Now the fun part: connecting states into a network graph. We create a
bipartite incidence matrix incidence_mat of size (States × CUSIPs) where
each cell is the percentage of that MBS’s UPB from that state For
example, if Security X has 50% California loans,
incidence_mat[“California”, “X”]=50. Using the bipartite R package, we
treat this as a two-mode network
# 1) Build a small long‐format df with only positive %UPB
mbs_web_df <- cusip_composition%>%
filter(Percent_State_UPB > 0) %>%
select(State, CUSIP, Percent_State_UPB)
# 2) If you want exactly the same "gf" grouping, add it here (though acast doesn't use it):
mbs_web_df$grouping_factor <- "gf"
# 3) Use acast() to pivot to a matrix: rows = States, cols = CUSIPs
incidence_mat <- acast(
mbs_web_df,
State ~ CUSIP,
value.var = "Percent_State_UPB",
fun.aggregate = sum,
fill = 0
)
# 4) Coerce to "web" so plotweb() does CA under the hood
class(incidence_mat) <- c("web", class(incidence_mat))
# 5) Define your palette exactly as in the Women example
cols1 <- c(
'#8dd3c7','#ffffb3','#bebada','#fb8072',
'#80b1d3','#fdb462','#b3de69','#fccde5',
'#d9d9d9','#bc80bd','#ccebc5','#ffed6f'
)
# 6) Plot the bipartite graph with correspondence‐analysis layout, included image because plotting takes a while
#plotweb(
# incidence_mat, # our State×CUSIP %UPB matrix
# method = "cca", # same as the women example
## labsize = 1,
# text.rot = 90,
# col.interaction = cols1,
# bor.col.interaction = cols1
#)
#title("Full")
In the chunk above, I first used acast() to build incidence_mat, a matrix with rows = States and columns = CUSIPs, whose entries are the percent of unpaid principal balance (UPB) contributed by each state to each security. I then coerced that matrix to class “web” and called, plotweb( incidence_mat, method=“cca”, … ) which runs a Correspondence Analysis (CA) on these weighted link values.
Because we pass the actual % UPB into CA, securities that draw their balances from very similar state mixes are placed close together in the ordination, and likewise for states that have similar exposure profiles.
From the plot we can see major hubs like California and New York are noticiable by the extremely dense fans of lines radiating upward at their tick marks showing how they dominate many securities. Sub‑clusters of states (e.g. Midwest vs. Pacific vs. Northeast) whose line‑bundles overlap in similar regions, indicating geographic or market‐style groupings.Outlier territories (Guam, Puerto Rico) with only a few thin lines off to the side, reflecting their minimal footprint in the Fed’s MBS holdings.
# 0) Restrict to Concentrated CUSIPs and build binary long table
conc_ids <- cusip_exposure %>%
filter(Exposure == "Concentrated") %>%
pull(CUSIP)
mbs_data_bin <- state_upb_long %>%
filter(UPB > 0, CUSIP %in% conc_ids) %>%
mutate(
State = as.character(State),
CUSIP = as.character(CUSIP),
grouping_factor = "gf" # dummy factor
) %>%
select(State, CUSIP, grouping_factor)
# 1) Build the bipartite object on the full binary table
mbs_web_conc <- frame2webs(
mbs_data_bin,
varnames = c("State", "CUSIP", "grouping_factor"),
type.out = "list",
emptylist = TRUE
)
# 2) Extract & coerce the matrix
web_mat_conc <- mbs_web_conc$gf
#lass(web_mat_conc) <- c("web", class(web_mat_conc))
# 3) Define the 12‑color palette
cols1 <- c(
'#8dd3c7','#ffffb3','#bebada','#fb8072',
'#80b1d3','#fdb462','#b3de69','#fccde5',
'#d9d9d9','#bc80bd','#ccebc5','#ffed6f'
)
# 4) Plot with correspondence‐analysis layout
plotweb(
web_mat_conc,
method = "cca",
labsize = 1,
x.lim = c(0, 3.45),
y.lim = c(-0.2, 2.2),
text.rot = 90,
col.interaction = cols1,
bor.col.interaction = cols1
)
title("Full Concentrated CUSIPs Network (Binary Presence)")
exposed <- cusip_exposure %>% left_join(cusip_composition, by = "CUSIP")
## Warning in left_join(., cusip_composition, by = "CUSIP"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 135310 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
library(dplyr)
library(reshape2) # for acast()
library(bipartite) # for plotweb()
# 1) Grab only Concentrated CUSIPs with their %UPB weights
wt_df <- exposed %>%
filter(Exposure == "Concentrated", Percent_State_UPB > 0) %>%
rename(
State = State, # keep the State column
CUSIP = CUSIP, # keep the CUSIP column
weight = Percent_State_UPB # our numeric link weight
) %>%
select(State, CUSIP, weight)
# 2) Pivot to a matrix: rows = States, cols = CUSIPs, entries = weight
wt_mat <- acast(
wt_df,
State ~ CUSIP,
value.var = "weight",
fun.aggregate = sum, # in case any dupes
fill = 0 # empty slots = 0
)
# 3) Give it the “web” class so plotweb() knows what to do
class(wt_mat) <- c("web", class(wt_mat))
# 4) (Optionally) scale your link‐widths
link_w <- wt_mat / max(wt_mat) # between 0 and 1
link_w <- 0.5 + 2.5 * link_w # now in [0.5,3]
# 5) Pick a color for all links
link_col <- "steelblue"
# 6) Plot the weighted bipartite we
# 1) Create an igraph from your weighted incidence matrix
g_bi <- graph_from_incidence_matrix(wt_mat, weighted = TRUE)
# 2) Separate States (type=FALSE) vs. CUSIPs (type=TRUE)
V(g_bi)$color <- ifelse(V(g_bi)$type, "tomato", "skyblue")
V(g_bi)$size <- ifelse(V(g_bi)$type, 4, 2)
# 3) Plot with variable edge widths
plot(
g_bi,
layout = layout_as_bipartite(g_bi),
vertex.label = NA,
edge.width = E(g_bi)$weight / max(E(g_bi)$weight) * 3,
edge.color = "grey60",
main = "Weighted Bipartite: States ↔ Concentrated CUSIPs"
)
legend(
"topleft",
legend = c("CUSIP","State"),
pch = 21,
pt.bg = c("tomato","skyblue"),
pt.cex = 2,
bty = "n"
)
library(igraph)
# 1) Reconstruct the bipartite igraph directly from the weighted incidence matrix
# wt_mat is your State×CUSIP matrix of Percent_State_UPB
g_bip <- graph_from_incidence_matrix(
wt_mat,
weighted = TRUE, # preserve weights
mode = "all" # makes an undirected bipartite graph
)
# Sanity-check: bipartite types
# FALSE = “row” nodes (States), TRUE = “column” nodes (CUSIPs)
table(V(g_bip)$type)
##
## FALSE TRUE
## 53 3632
# 2) Project to the CUSIP–CUSIP network, summing co-occurrences
projs <- bipartite_projection(g_bip, multiplicity = TRUE)
g_cusip <- projs$proj2 # by default proj2 are the “column” (CUSIP) nodes
# 3) Compute un‐weighted degree **via igraph** and grab the top 100
deg <- sort( igraph::degree(g_cusip), decreasing = TRUE )
top100 <- names(deg)[ seq_len(min(100, length(deg))) ]
# 4) Get their numeric vertex IDs and induce subgraph
vids100 <- match(top100, V(g_cusip)$name)
g_top <- igraph::induced_subgraph(g_cusip, vids100)
# 5) Layout + plot
lay <- igraph::layout_with_fr(g_top)
plot(
g_top,
layout = lay,
vertex.size = 5 + 5 * (igraph::degree(g_top) / max(igraph::degree(g_top))),
vertex.color = "tomato",
vertex.label = NA,
edge.width = E(g_top)$weight / max(E(g_top)$weight) * 2,
edge.color = "gray80",
main = "Top 100 Concentrated CUSIPs by Unweighted Degree"
)
library(igraph)
# g_bi from before:
g_bi <- graph_from_incidence_matrix(wt_mat, weighted = TRUE)
# project to get state‐state graph:
projs <- bipartite_projection(g_bi, multiplicity = TRUE)
g_states <- projs$proj2 # assuming 'proj2' are the FALSE‐type vertices = States
# simplify/threshold to the top‐50 heaviest edges
E(g_states)$weight <- E(g_states)$weight
g_trim <- delete_edges(g_states, E(g_states)[weight < quantile(weight, .75)])
# plot
plot(
g_trim,
layout = layout_with_fr,
edge.width = E(g_trim)$weight / max(E(g_trim)$weight) * 5,
vertex.size= 8,
vertex.label.cex=0.8,
vertex.color="skyblue",
edge.color= "grey60",
main="State–State Co‐Exposure Network\n(top 25% shared CUSIPs)"
)
degreedistr(web_mat_conc, plot.it=TRUE, level="both")
## $`lower level dd fits`
## Estimate Std. Error Pr(>|t|) R2 AIC
## exponential NA NA NA NA NA
## power law 0.26010915 0.023836446 1.025349e-14 0.8435128 -37.13211
## truncated power law 0.02272253 0.006239741 6.629036e-04 0.9975564 -245.51701
##
## $`higher level dd fits`
## Estimate Std. Error Pr(>|t|) R2 AIC
## exponential 0.24162051 0.005068736 1.980468e-33 0.9983363 -193.51829
## power law 0.89352092 0.052767550 1.951829e-18 0.9580854 -84.65123
## truncated power law 0.05439776 0.040304364 1.860389e-01 0.9982016 -193.07138
#networklevel(web_mat_conc)
# Network‐level (only key metrics)
net_conc <- networklevel(
web_mat_conc,
index = c("connectance","nestedness"),
level = "both",
weighted = TRUE
)
To analyze state-to-state relations, we “project” the bipartite network to a one-mode state network, connecting states that share securities. Specifically, we compute a weighted adjacency matrix state_adj where entry (i,j) is the sum of products of state i’s and state j’s percentages in each security. This captures how strongly i and j are linked via common pools. In code:
state_adj_conc <- web_mat_conc %*% t(web_mat_conc)
g_state_conc <- graph_from_adjacency_matrix(
state_adj_conc,
mode = "undirected",
weighted= TRUE,
diag = FALSE
)
vcount(g_state_conc) # number of nodes
## [1] 53
edge_density(g_state_conc) # proportion of possible edges present
## [1] 0.9702467
#components(g_state_conc) # connected components
diameter(g_state_conc) # longest shortest‐path
## [1] 11
transitivity(g_state_conc) # global clustering coefficient
## [1] 0.979088
library(igraph)
# 2.1 Cluster with the Louvain algorithm
comm_louvain <- cluster_louvain(g_state_conc, weights = E(g_state_conc)$weight)
# 2.2 How many communities?
length(comm_louvain)
## [1] 4
# 2.3 Modularity score
modularity(comm_louvain)
## [1] 0.04117125
# assign membership as a vertex attribute
V(g_state_conc)$community <- membership(comm_louvain)
# pick a palette
pal <- RColorBrewer::brewer.pal(max(V(g_state_conc)$community), "Set3")
plot(
g_state_conc,
vertex.color = pal[V(g_state_conc)$community],
vertex.label = V(g_state_conc)$name,
vertex.size = 5,
edge.width = 0.5,
edge.color = "grey80",
main = paste0("Louvain Communities (Q=", round(modularity(comm_louvain), 3),")")
)
We found about 5 state communities. Modularity Q ≈ 0.28, meaning the
network has a meaningful but not extreme community structure (0.28
indicates more clustering than random Plotting the network with nodes
colored by community gave this result: Louvain Communities of State
Network (Q=0.279). Puerto Rico stands isolated, as do a few other
territories (Guam, Virgin Islands). The main continental U.S. clusters
densely in the bottom-right blob (labels overlap due to graph density).
This visualization (while cluttered) suggests Puerto Rico doesn’t share
MBS with others forming its own one-node “community.”
# keep only strong links
g_thresh <- delete_edges(
g_state_conc,
E(g_state_conc)[ weight < 5 ] # e.g. fewer than 5 shared CUSIPs
)
Interpreting the clusters: states tend to group by region. In our analysis, we observed: a Pacific/West cluster (CA, HI, AZ, NV, etc.), a Northeast cluster (NY, NJ, MA, PA, etc.), a Midwest cluster (IL, OH, MI, WI, etc.), a Southern cluster (FL, GA, AL, MS, etc.), and a Mountain/Plains cluster (TX, CO, KS, etc.). These align with known regional lending patterns. Mortgages in the same MBS often come from either geographically proximate states or states with similar loan programs (e.g., many rural states might share government-loan pools).
library(igraph)
# 2.1 Cluster with the Louvain algorithm
comm_louvain1 <- cluster_louvain(g_thresh, weights = E(g_state_conc)$weight)
# 2.2 How many communities?
length(comm_louvain1)
## [1] 8
# 2.3 Modularity score
modularity(comm_louvain1)
## [1] 0.1210441
sizes(comm_louvain1)
## Community sizes
## 1 2 3 4 5 6 7 8
## 15 7 10 2 1 7 7 4
membership(comm_louvain1)
## Alabama Alaska Arizona Arkansas California
## 1 2 1 3 1
## Colorado Connecticut Delaware Florida Georgia
## 3 3 2 4 3
## Guam Hawaii Idaho Illinois Indiana
## 5 6 6 3 7
## Iowa Kansas Kentucky Louisiana Maine
## 2 2 3 3 1
## Maryland Massachusetts Michigan Minnesota Mississippi
## 1 3 1 8 8
## Missouri Montana Nebraska Nevada New Hampshire
## 2 1 1 8 3
## New Jersey New Mexico New York North Carolina North Dakota
## 8 2 1 6 1
## Ohio Oklahoma Oregon Pennsylvania Puerto Rico
## 1 7 6 6 7
## Rhode Island South Carolina South Dakota Tennessee Texas
## 4 7 6 1 2
## Utah Vermont Virgin Islands Virginia Washington
## 7 7 1 7 6
## West Virginia Wisconsin Wyoming
## 3 1 1
Hubs and connectivity: California sits at the center of the West cluster, connected to many states (because pools nationwide often include some California loans – CA has a huge volume). New York similarly anchors the Northeast cluster. These hubs are the high-degree nodes we expectedSmaller states like Puerto Rico or Guam ended up nearly isolated – the Fed did hold some PR loans, but those securities contained almost no other states’ loans, leaving PR with few connections (only minimal links where PR loans co-occurred with, say, one or two other places in a pool). In fact, PR’s isolation in the graph above (the lone blue “Puerto Rico” node) confirms it rarely shares MBS with the mainland. Puerto Rico and Guam formed their own micro-community, an intriguing finding, likely because mortgages in PR and Guam are often packaged separately (these territories have unique housing markets that investors treat as distinct (newyorkfed.org).
# 1.1 Get the list of all CUSIPs in which Puerto Rico has any UPB
pr_cusips <- cusip_composition %>%
filter(State == "Puerto Rico", UPB > 0) %>%
pull(CUSIP) %>%
unique()
# 1.2 Extract their holding periods from the main data
pr_holdings <- data %>%
filter(CUSIP %in% pr_cusips) %>%
select(CUSIP, `Purchase Date`, `Sale Date`, HoldingDays)
# 1.3 Quick summary
summary(pr_holdings$HoldingDays)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0 672 1736 2114 3766 4977 41
# 1.4 Number of unique PR CUSIPs
length(pr_cusips)
## [1] 285
Results: State Connectivity and Outliers
Our network analysis uncovers a few key insights: Regional Clusters: States cluster by geography in the Fed’s MBS network. The Fed’s portfolio effectively linked together neighboring or economically similar states more often This suggests MBS pools often comprise regionally concentrated loans (perhaps due to lenders packaging nearby loans together). For example, our Louvain algorithm found a distinct Midwest community, separate from a coastal community – reminiscent of how social networks have cohesive subgroups.
Dominant Hubs: A few large states are connected to nearly everyone. California’s mortgages appear in a huge number of Fed MBS, tying CA to 46 other states (by our unweighted measure). New York isn’t far behind. In network terms, these hubs have high degree and weight – the Fed’s support was broadest for mortgages from these states, giving them many “neighbors” in the network. Such hub-and-spoke structure is common in affiliation networks
(here, California is the “Hollywood” of mortgages – co-starring with almost every other state in some MBS!). This hub phenomenon yields a heavy-tailed connection distribution, which network scientists interpret as a sign of heterogeneity (few nodes with disproportionately many links)
Peripheral Outliers: Puerto Rico (and Guam, etc.) form tiny spokes barely attached to the network. Puerto Rico loans rarely co-mingled with mainland loans
Essentially, when the Fed bought pools of Puerto Rican mortgages, those pools were almost entirely PR-based (e.g., a Ginnie Mae pool of PR FHA loans). So PR stands alone, a nearly isolated node. In community detection, PR ended up in its own one-state communityThis minimal footprint in the Fed’s MBS holdings reflects how niche PR’s mortgage market is relative to the broader US From a policy view, one could say Fed support didn’t diffuse into PR’s network – it was self-contained.
Holding Periods vs Concentration: We noticed that MBS heavily concentrated in one state (like PR pools) tended to be held longer by the Fed. Puerto Rico is an extreme case: the Fed held its PR MBS an average of ~5.8 years, far above the overall median 2 years
In general, our data showed Concentrated pools (>50% in one state) had somewhat longer holding periods than Diverse pools (we confirmed this with a boxplot analysis, which showed a higher median for concentrated pools). One explanation: highly localized pools (especially from weaker economies) prepay slower (homeowners less often refinance or move), leaving the Fed holding the bag longer.
This project is a glimpse of how blending data science, network theory, and domain knowledge can yield insights. The Fed’s actions, though national in scope, had a spatial imprint. Visualizing that as a network makes the intangible (like $1.7 trillion of support (ginniemae.gov) more concrete: you can see which states were pulled together by the safety net and which dangled alone. For policymakers, this underscores that broad interventions may still have uneven regional outcomes. For network scientists, it’s a case study in bipartite community detection in a real financial system.